The Inclusion Problem for Regular Expressions

نویسنده

  • Dag Hovland
چکیده

This paper presents a polynomial-time algorithm for the inclusion problem for a large class of regular expressions. The algorithm is not based on construction of finite automata, and can therefore be faster than the lower bound implied by the Myhill-Nerode theorem. The algorithm automatically discards irrelevant parts of the right-hand expression. The irrelevant parts of the right-hand expression might even be 1-ambiguous. For example, if r is a regular expression such that any DFA recognizing r is very large, the algorithm can still, in time independent of r, decide that the language of ab is included in that of (a + r)b. The algorithm is based on a syntax-directed inference system. It takes arbitrary regular expressions as input. If the 1-ambiguity of the right-hand expression becomes a problem, the algorithm will report this. Otherwise, it will decide the inclusion problem for the input.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Complexity of Decision Problems for Simple Regular Expressions

We study the complexity of the inclusion, equivalence, and intersection problem for simple regular expressions arising in practical XML schemas. These basically consist of the concatenation of factors where each factor is a disjunction of strings possibly extended with ‘∗’ or ‘?’. We obtain lower and upper bounds for various fragments of simple regular expressions. Although we show that inclusi...

متن کامل

Complexity of Decision Problems for XML Schemas and Chain Regular Expressions

We study the complexity of the inclusion, equivalence, and intersection problem for XML schemas occurring in practice. These schemas make use of regular expressions with a very simple structure: they basically consist of the concatenation of factors, where each factor is a disjunction of strings, possibly extended with “∗”, “+”, or “?”. We refer to these as CHAin Regular Expressions (CHAREs). W...

متن کامل

Inclusion of Unambiguous RE#s is NP-Hard

We show that testing inclusion between languages represented by regular expressions with numerical occurrence indicators (#REs) is NP-hard, even if the expressions satisfy the requirement of “unambiguity”, which is required for XML Schema content model expressions. 1 Proof of the result We have seen before [3] that testing for inclusion and overlap of languages represented by #REs is NP-hard. T...

متن کامل

Optimizing Schema Languages for XML: Numerical Constraints and Interleaving

The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of ...

متن کامل

Regular Expressions with Numerical Occurrence Indicators - preliminary results

Regular expressions with numerical occurrence indicators (#REs) are used in established text manipulation tools like Perl and Unix egrep, and in the recent W3C XML Schema Definition Language. Numerical occurrence indicators do not increase the expressive power of regular expressions, but they do increase the succinctness of expressions by an exponential factor. Therefore methods based on straig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010